In this project, we are asked to implement Perceptron Model.
In this question, we are asked to generate 60 arbitrary data points from the first and the eight quadrant of the 3d space with equal sizes of which 40 are for training and 20 are for test. I generated 30 data points from uniform distribution between 0.01 and 1 for the first quadrant and between -1 and -0.01 for the eight quadrant.Then, I randomly assigned 20 of them to the training set and the remaining 10 to the test set.
In the whole project I will treat the observations from the first quadrant as class 1 and the observations from the eight quadrant as class -1.
library(data.table)
library(plotly)
library(reshape2)
set.seed(66)
#generating data points
class1_x=runif(30,0.01,1)
class1_y=runif(30,0.01,1)
class1_z=runif(30,0.01,1)
#assigning numbers from 1 to 30 randomly so that I will divide the training and the test set randomly
train_test=sample(30)
class1=as.data.table(cbind(1,class1_x,class1_y,class1_z,train_test))
setnames(class1,"V1","constant")
class1[,is_train:=ifelse(train_test<=20,1,0)]
class1[,train_test:=NULL]
setnames(class1,old=c("class1_x","class1_y","class1_z"),new=c("x","y","z"))
class1[,class:=1] #class 1
#generating data points
class2_x=runif(30,-1,-0.01)
class2_y=runif(30,-1,-0.01)
class2_z=runif(30,-1,-0.01)
#assigning numbers from 1 to 30 randomly so that I will divide the training and the test set randomly
train_test=sample(30)
class2=as.data.table(cbind(1,class2_x,class2_y,class2_z,train_test))
setnames(class2,"V1","constant")
class2[,is_train:=ifelse(train_test<=20,1,0)]
class2[,train_test:=NULL]
setnames(class2,old=c("class2_x","class2_y","class2_z"),new=c("x","y","z"))
class2[,class:=-1] #class -1
data=rbind(class1,class2)
In this question, we are asked to plot all the observations.
fig <- plot_ly(data, x = ~x, y = ~y, z = ~z, color = ~as.factor(class), colors = c('#BF382A', '#0C4B8E'))
fig <- fig %>% add_markers()
fig <- fig %>% layout(title="Class 1 vs Class -1")
fig
Now, the initialization for the perceptron model is done:
I initialize the weights randomly with uniform distribution between -1 and 1 including theta. I assigned the learning rate as 0.2. With higher learning rates (still lower than 1), the iterations reach to the convergence so quickly. Thus, I set it to 0.2 on purpose so that I can see the cost function change better in the other question.
Before starting the algorithm, I shuffle the data, so that I will be picking observations one by one randomly because the data is ordered such that first comes the class 1 observations and then the class -1. With the initial values, I calculate the cost function for iteration 0.
While assigning to a class, if the predicted value is higher than 0, I assign it to class 1 and if it is less than or equal to 0, I assign it to class -1. For the first iteration, I calculated the predicted value with the initial weights.
#initialization
set.seed(66)
#weight initialization
weights=t(as.matrix(runif(4,-1,1))) #constant,weight1,weight2,weight3
learning_rate=0.2
#train and test split
data_train=data[is_train==1]
data_test=data[is_train==0]
#shuffling the data
rows <- sample(nrow(data_train))
data_train_shuffled <- data_train[rows, ]
#cost function calculation
cost_summary=data.table()
i=0
data_train_shuffled[,predicted:=constant*weights[1]+x*weights[2]+y*weights[3]+z*weights[4]]
#assigning class
data_train_shuffled[,predicted_class:=ifelse(predicted>0,1,-1)]
#cost
data_train_shuffled[,cost_contribution:=(class-predicted_class)^2]
cost_iter=(1/2)*sum(data_train_shuffled$cost_contribution)
cost_summary=rbind(cost_summary,data.table(iteration=i,cost=cost_iter))
In this question, the weights are updated using 40 training samples:
for (i in 1:40){
observation=data_train_shuffled[i]
#predicted value calculation
y=as.matrix(weights)%*%t(as.matrix(observation[,1:4]))
#class assignment (y>0 :class 1 & y<=0 :class -1)
if (y>0){
y=1
}else{
y=-1
}
#updating weights
weights=weights+learning_rate*(observation$class-y)*observation[,1:4]
#new y values calculation with the updated weights
data_train_shuffled[,predicted:=constant*weights$constant+x*weights$x+y*weights$y+z*weights$z]
#assigning class
data_train_shuffled[,predicted_class:=ifelse(predicted>0,1,-1)]
#cost calculation
data_train_shuffled[,cost_contribution:=(class-predicted_class)^2]
cost_iter=(1/2)*sum(data_train_shuffled$cost_contribution)
cost_summary=rbind(cost_summary,data.table(iteration=i,cost=cost_iter))
}
First, classes are assigned with the last updated weights. The plane is:
w1x1+w2x2+w3*x3+constant=0. So that the z coordinate of the plane will be:
x3=-(w1/w3)x1-(w2/w3)x2-(constant/w3)
#class assignment
data_train[,predicted:=constant*weights$constant+x*weights$x+y*weights$y+z*weights$z]
data_train[,predicted_class:=ifelse(predicted>0,1,-1)]
#preprocessing steps for drawing plane
graph_resolution <- 0.01
x_axis <- seq(min(data_train$x), max(data_train$x), by = graph_resolution)
y_axis <- seq(min(data_train$y), max(data_train$y), by = graph_resolution)
surface <- expand.grid(x = x_axis,y = y_axis,KEEP.OUT.ATTRS = F)
#z coordinate calculation for the plane with the x and y value range
surface$z <- (((-1*weights$x)/weights$z)*surface$x)-((weights$y/weights$z)*surface$y)-(weights$constant/weights$z)
surface <- acast(surface, y ~ x, value.var = "z") #y ~ x
fig = plot_ly(data_train, x = ~x, y = ~y, z = ~z, color = ~as.factor(class), colors = c('#BF382A', '#0C4B8E'))
fig = fig %>% add_markers()
fig <- add_trace(p = fig,
z = surface,
x = x_axis,
y = y_axis,
type = "surface")
fig = fig %>% layout(title="Class 1 vs Class -1")
#fig = fig %>% layout(zaxis = list(range = c(-1, 1)))
fig
It can be seen that the training points are seperated with the resulting plane, so they are linearly seperable.
Here, the prediction are obtained for the test set, and the whole set is plotted with the resulting plane found above.
#class assignment
data_test[,predicted:=constant*weights$constant+x*weights$x+y*weights$y+z*weights$z]
data_test[,predicted_class:=ifelse(predicted>0,1,-1)]
fig = plot_ly(data_test, x = ~x, y = ~y, z = ~z, color = ~as.factor(class), colors = c('#BF382A', '#0C4B8E'))
fig = fig %>% add_markers()
fig <- add_trace(p = fig,
z = surface,
x = x_axis,
y = y_axis,
type = "surface")
fig = fig %>% layout(title="Class 1 vs Class -1")
fig
The test set is also well seperated with the decision surface. Also, the cost function is computed for the test set:
#cost
data_test[,cost_contribution:=(class-predicted_class)^2]
cost_test=(1/2)*sum(data_test$cost_contribution)
cost_test
## [1] 0
It is 0 as expected from the plot. Thus, they are all classified correctly.
The cost function through iterations is plotted:
plot(cost_summary$cost,cost_summary$iteration,type="l",
main="Cost Function vs Iteration Index",
xlab="Iteration Index",
ylab="Cost Function")
It can be seen that the cost function either decreases or remains the same with the updated weights. It reaches to 0 at the end.